Scalable, adaptable, and manageable system for multimedia identification

ABSTRACT

An architecture for a multimedia search system is described. To perform similarity matching of multimedia query frames against reference content, reference database comprising of a cluster index using cluster keys to perform similarity matching and a multimedia index to perform sequence matching is built. Methods to update and maintain the reference database that enables addition and removal of the multimedia contents, including portions of multimedia content, from the reference database in a running system are described. Hierarchical multi-level partitioning methods to organize the reference database are presented. Smart partitioning of the reference multimedia content according to the nature of the multimedia content, and according to the popularity among the social media, that supports scalable fast multimedia identification is also presented. A caching mechanism for multimedia search queries in a centralized or in a decentralized distributed system and a client based local multimedia search system enabling multimedia tracking are described.

This application is a continuation of U.S. patent application Ser. No.14/151,335 filed Jan. 9, 2014 and U.S. patent application Ser. No.14/151,294 filed Jan. 9, 2014 and issued as U.S. Pat. No. 8,965,863which are both divisionals of U.S. patent application Ser. No.13/102,479 filed May 6, 2011 issued as U.S. Pat. No. 8,655,878 whichclaims the benefit of U.S. Provisional Patent Application Ser. No.61/331,965 entitled “Scalable, Adaptable, and Manageable System forMultimedia Identification” filed May 6, 2010 all of which are herebyincorporated by reference in their entirety.

CROSS REFERENCE TO RELATED APPLICATIONS

U.S. patent application Ser. No. 14/151,294 filed Jan. 9, 2014 entitled“A Scalable, Adaptable, and Manageable System for MultimediaIdentification”, U.S. patent application Ser. No. 12/141,163 filed Jun.18, 2008 entitled “Methods and Apparatus for Providing a ScalableIdentification of Digital Video Sequences”, U.S. patent application Ser.No. 12/141,337 filed on Jun. 18, 2008, “Methods and Apparatus forMulti-Dimensional Content Search and Video Identification”, U.S. patentapplication Ser. No. 12/491,896 filed Jun. 25, 2009 entitled “DigitalVideo Fingerprinting Based on Resultant Weighted Gradient OrientationComputation”, U.S. patent application Ser. No. 12/612,729 filed Nov. 5,2009 entitled “Digital Video Content Fingerprinting Based on ScaleInvariant Interest Region Detection with an Array of AnisotropicFilters”, U.S. patent application Ser. No. 12/772,566 filed May 3, 2010entitled “Media Fingerprinting and Identification System”, U.S. patentapplication Ser. No. 12/788,796 filed May 27, 2010 entitled “Multi-MediaContent Identification Using Multi-Level Content Signature Correlationand Fast Similarity Search”, U.S. patent application Ser. No. 12/955,416filed Nov. 29, 2010 entitled “Digital Video Content Fingerprinting UsingImage Pixel Intensity and Color Information”, and U.S. patentapplication Ser. No. 13/076,628 filed Mar. 31, 2011 entitled“Scale/Affine Invariant Interest Region Detection with an Array ofAnisotropic Filters for Video Fingerprinting” have the same assignee asthe present application, are related applications, and are herebyincorporated by reference in their entirety.

BACKGROUND OF THE INVENTION

Media applications which include video and audio database management,database browsing and identification are undergoing explosive growth andare expected to continue to grow. To address this growth, there is aneed for a comprehensive solution related to the problem of creating amultimedia sequence database and identifying, within such a database, aparticular multimedia sequence or sequences that are tolerant of mediacontent distortions. Multiple applications include video databasemining, copyright content detection for video hosting web-sites,contextual advertising placement, and broadcast monitoring of videoprogramming and advertisements.

Multimedia fingerprinting refers to the ability to generate associatedidentifying data, referred to as a fingerprint, from the multimediaimage, audio and video content. A fingerprint ideally has severalproperties. First, the fingerprint should be much smaller than theoriginal data. Second, the fingerprint should be designed such that itcan be searched for in a large database of fingerprints. Third, theoriginal multimedia content should not be able to be reconstructed fromthe fingerprint. Fourth, for multimedia content that is a distortedversion of another multimedia content, fingerprints of the original anddistorted versions should be similar. Examples of some common multimediadistortions include, selecting a clip of video content temporally,cropping the image data, re-encoding the image or audio data to a lowerbit rate, changing a frame rate of the video or audio content,re-recording the multimedia data via some analog medium such as acamcorder in a movie theatre, and changing the aspect ratio of the imagecontent. A fingerprint with the fourth property is deemed to be robustagainst such distortions. Such a system of fingerprinting and search ispreferable to other methods of content identification. For example,multimedia watermarking changes the multimedia content by insertingwatermark data. Unlike multimedia watermarking, fingerprinting does notchange the content.

Fingerprinting is a very challenging problem. So also developing ascalable system that can easily be managed, changed, replicated is achallenging system problem.

Increasing demand for such fingerprinting and search solutions, whichinclude standard definition (SD) and high definition (HD) formats ofvideo, three dimensional (3D) videos, virtual reality media content,requires increasing sophistication, flexibility, and performance in thesupporting algorithms and hardware. The sophistication, flexibility, andperformance that are desired exceed the capabilities of currentgenerations of software based solutions, in many cases, by an order ofmagnitude.

SUMMARY OF THE INVENTION

In one or more of its several aspects, the present invention recognizesand addresses problems such as those described above. To such ends, anembodiment of the invention addresses a method for creating adistributed reference multimedia database. A reference multimediadatabase is split into a first identifiable portion and a secondidentifiable portion. The first identifiable portion is stored at afirst search server and the second identifiable portion is stored at asecond search server. A first query is assigned to the first searchserver based on the first identifiable portion stored therein and asecond query is assigned to the second search server based on the secondidentifiable portion stored therein. The first query is searched for atthe first search server in parallel with the second query searched forat the second search server to find a first content stored in the firstidentifiable portion and a second content stored in the secondidentifiable portion that have a close match to the associated firstquery and to the associated second query.

Another embodiment of the invention addresses a method for creating atiered multimedia reference database. A reference multimedia database issplit into a first identifiable portion having multimedia contentrepresenting the most sought content and a second identifiable portionrepresenting the remaining content. The first identifiable portion isstored at a first search server and the second identifiable portion isstored at a second search server coupled to the first search server. Afirst query is assigned to the first search server. The first query issearched for at the second search server if a search for the first queryat the first search server is not successful to find multimedia contentstored in the first search server.

Another embodiment of the invention addresses a method of tracking unitsof multimedia content. Reference signatures retrieved from a remotesearch server for currently displayed units of multimedia content arestored in a client device. Matches in query results are detected forsucceeding displayed units of multimedia content using a track searchapproach to track current displayed units of multimedia content. Displayof succeeding units of multimedia content are blocked upon a no matchindication detected in the query results.

Another embodiment of the invention addresses a method for querycaching. Signatures of a video sequence are generated at a client. Acache key is generated from the signatures at the client. A search of aremote reference database is requested using the cache key. The remotereference database is searched for a match with the cache key. Resultslinked with a matching cache key are sent to the client, wherein theresults were generated from a previous full search of the referencedatabase.

A further embodiment of the invention addresses a method of signaturedatabase organization by a cluster index. A first set of first signaturerecords for units of multimedia content are stored in a cluster indexdata structure, wherein the first signature records are grouped by acluster key. A second set of second signature records for units ofmultimedia content are stored in a multimedia signature index datastructure, wherein the second signature records are grouped by amultimedia identification. Signatures are shared in the single serversearch system between the cluster index data structure and multimediasignature index data structure.

These and other features, aspects, techniques and advantages of thepresent invention will be apparent to those skilled in the art from thefollowing detailed description, taken together with the accompanyingdrawings and claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a fingerprinting and search system for media contentfingerprinting and identification in accordance with the presentinvention;

FIG. 2A illustrates a reference media database generation process inaccordance with the present invention;

FIG. 2B illustrates a query fingerprint generation process in accordancewith the present invention;

FIG. 2C illustrates a similarity search process in accordance with thepresent invention;

FIG. 2D illustrates a candidate multimedia filtering process inaccordance with the present invention;

FIG. 2E illustrates a signature correlation process in accordance withthe present invention;

FIG. 2F illustrates an exemplary signature database organized by acluster key index in accordance with the present invention;

FIG. 2G illustrates an exemplary signature database organized by amultimedia signature index organization in accordance with the presentinvention;

FIG. 3A illustrates an exemplary cluster key generation from a mediafingerprint in accordance with present invention;

FIG. 3B illustrates an exemplary process to generate a cluster key usingmetadata information in accordance with the present invention;

FIG. 3C illustrates an exemplary process to generate a final cluster keyin accordance with the present invention;

FIG. 4A illustrates an exemplary primary cluster index link structureusing the cluster keys, fingerprints, and metadata in accordance withthe present invention;

FIG. 4B illustrates an exemplary multiple cluster index database inaccordance with the present invention;

FIG. 4C illustrates a process to create a cluster index structure inaccordance with the present invention;

FIG. 5A illustrates an exemplary signature database organized bymultimedia ids that are used as multimedia signature indexes tofingerprint arrays in accordance with the present invention;

FIG. 5B illustrates a process to create a multimedia signature index inaccordance with the present invention;

FIG. 5C illustrates a data sharing organization that is configured withan exemplary cluster index and multimedia index data structure thatsupports sharing multimedia fingerprints in accordance with the presentinvention;

FIG. 5D illustrates another exemplary cluster index and multimedia indexdata structure that supports sharing multimedia fingerprints inaccordance with the present invention;

FIG. 6A illustrates a process for updating a cluster index structurewhen fingerprints for a new multimedia content are received inaccordance with the present invention;

FIG. 6B illustrates an exemplary state diagram showing a cluster indexstructure before and after an update that added fingerprints for a newmultimedia in accordance with the present invention;

FIG. 6C illustrates a process for updating the cluster index structurewhere an exclusive cluster lock is obtained for all of the clusters thatneed more space for new signatures in accordance with the presentinvention;

FIG. 7A presents a process for updating cluster index structure toremove fingerprints associated a multimedia content to be deleted inaccordance with the present invention;

FIG. 7B illustrates an exemplary state diagram showing a primary indexstructure before and after deleting fingerprints for multimedia contentto be deleted in accordance with the present invention;

FIG. 7C illustrates a process to delete fingerprints from the multimediaindex in accordance to the present invention;

FIG. 7D illustrates an exemplary state diagram that shows the multimediasignature index structure before and after content has been deleted fromit according the present invention;

FIG. 8A illustrates an exemplary state diagram showing a cluster indexstructure before and after an intermediate live event update for aportion of a multimedia content in accordance with the presentinvention;

FIG. 8B illustrates an exemplary state diagram showing a multimediaindex structure before and after an intermediate live event update for aportion of multimedia content in accordance with the present invention;

FIG. 8C illustrates an exemplary process to update the multimediasignature index for a live event update in accordance with the presentinvention;

FIG. 9 illustrates an exemplary state diagram showing a cluster indexstructure before and after deleting a portion of a multimedia contentthat was previously added using a live event update process inaccordance with the present invention;

FIG. 10A illustrates an exemplary system having multiple search serversand a distributed reference database in accordance with the presentinvention;

FIG. 10B illustrates an exemplary process to decide how to addmultimedia content to a distributed reference database in accordancewith the present invention;

FIG. 11 illustrates an exemplary multi-cluster multimedia identificationsystem in accordance with the present invention;

FIG. 12 illustrates an exemplary a three tier hierarchical system for amultimedia identification in accordance with the present invention;

FIG. 13 illustrates an exemplary two tier system for a multiple tiermulti-cluster system that achieves reference database scaling as well asperformance scaling in accordance with the present invention;

FIG. 14A illustrates an exemplary process for multimedia identificationand matched multimedia tracking at a local client in accordance with thepresent invention;

FIG. 14B illustrates an exemplary state diagram of various searchqueries done at a client that has ability to perform local multimediatrack function in accordance with the present invention;

FIG. 15A illustrates an exemplary search process operating on a searchserver with query caching functionality in accordance with the presentinvention;

FIG. 15B illustrates a distributed search system process thatincorporates centralized and distributed cache servers in accordancewith the present invention; and

FIG. 15C illustrates a process executed at query clients for cache basedmultimedia content search in accordance with the present invention.

DETAILED DESCRIPTION

The present invention will now be described more fully with reference tothe accompanying drawings, in which several embodiments of the inventionare shown. This invention may, however, be embodied in various forms andshould not be construed as being limited to the embodiments set forthherein. Rather, these embodiments are provided so that this disclosurewill be thorough and complete, and will fully convey the scope of theinvention to those skilled in the art.

It will be appreciated that the present disclosure may be embodied asmethods, systems, or computer program products. Accordingly, the presentinventive concepts disclosed herein may take the form of a hardwareembodiment, a software embodiment or an embodiment combining softwareand hardware aspects. Furthermore, the present inventive conceptsdisclosed herein may take the form of a computer program product on acomputer readable storage medium having non-transitory computer usableprogram code embodied in the medium. Any suitable computer readablemedium may be utilized including hard disks, CD-ROMs, optical storagedevices, flash memories, or magnetic storage devices.

Computer program code or software programs that are operated upon or forcarrying out operations according to the teachings of the invention maybe written in a high level programming language such as C, C++, JAVA′,Smalltalk, JavaScript®, Visual Basic®, TSQL, Python, Ruby, Perl, use of.NET™ Framework, Visual Studio® or in various other programminglanguages. Software programs may also be written directly in a nativeassembler language for a target processor. A native assembler programuses instruction mnemonic representations of machine level binaryinstructions. Program code or computer readable medium as used hereinrefers to code whose format is understandable by a processor. Softwareembodiments of the disclosure do not depend upon their implementationwith a particular programming language.

The methods described in connection with the embodiments disclosedherein may be embodied directly in hardware, in a software moduleexecuted by a processor, or in a combination of the two. A softwaremodule may reside in RAM memory, flash memory, ROM memory, EPROM memory,EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or anyother form of storage medium known in the art. A computer-readablestorage medium may be coupled to the processor through local connectionssuch that the processor can read information from, and write informationto, the storage medium or through network connections such that theprocessor can download information from or upload information to thestorage medium. In the alternative, the storage medium may be integralto the processor.

Systems and methods are described that are highly scalable to very largemultimedia databases. A reference multimedia database can be modified byadding a unit of multimedia content or removing an existing unit ofmultimedia content while it is being used for multimedia identification.A unit of multimedia content may be a frame or a sequence of frames of avideo, an audio clip, other multimedia formatted data, such as framesfrom a movie. A unit of multimedia content may also be a television showwithout advertisements, an advertisement, a song, or similar unit ofcommunication. The search system can be tuned to the desired speed ofmultimedia matching by centralized and distributed systems, byreplication of individual search machines or search machine clusters, byuse of a hierarchical tier of search machines and reference databases,by partitioning of reference databases, by multimedia query caching, bylocal client search methods, by client tracking, or by combinations ofthe previously mentioned arrangements. As an example, the search systemcan be implemented in a centralized client server model, or as adistributed system, or by a combination of such approaches. Also, adistributed search system may be operable on a variety of distributednetworks, such as a peer to peer (P2P) system. In addition, searchfunctions or a complete search operation may be operable at the client.

The following nomenclature is used in describing the present invention.For example, multimedia content represents any video, audio oraudio-visual content. Multimedia content may also represent a series ofphotographs or pictures, a series of audio files, or other associateddata, such as 3D video content or 4D content in which sensory feedbackssuch as touch feedback sensations are presented simultaneously withvisual and audio content.

The terms is signature and fingerprint both denote the same structure ofa sequence of bits and may be used interchangbly. A fingerprint isgenerated to represent a unit of multimedia content using afingerprinting method that operates on the unit of multimedia content. Acluster key is a type of hash key. A cluster index is a data structurethat holds all of the signatures that have the same cluster key. Amultimedia signature index is a data structure that is used to holdsignatures associated with a unit of multimedia content.

A number of exemplary goals of a multimedia identification systeminclude an ability to handle large capacity multimedia databases andhigh density media files. The multimedia identification system is toprovide high performance and respond with accurate media identificationwhen queried. Also, the overall design should be scalable to efficientlyhandle increasing capacity of the multimedia databases and an arbitrarylength of a query sequence.

To provide for such needs, FIG. 1 illustrates a fingerprinting andsearch system 100 for both media fingerprinting and identification inaccordance with an embodiment of the present invention. Thefingerprinting and search system 100 includes user sites 102 and 103, aserver 106, a video database 108, and a video fingerprinting and videoidentification process 112 operated, for example, by user site 102. Anetwork 104, such as the Internet, a wireless network, or a privatenetwork, connects sites 102 and 103 and server 106. Each of the usersites, 102 and 103 and server 106 may include a processor complex havingone or more processors, having internal program storage and local usercontrols such as a monitor, a keyboard, a mouse, a printer, and mayinclude other input or output devices, such as an external file storagedevice and communication interfaces.

The user site 102 may comprise, for example, a personal computer, alaptop computer, a tablet computer, or the like equipped with programsand interfaces to support data input and output and video fingerprintingand search monitoring that may be implemented both automatically andmanually. The user site 102, for example, may store programs, such asthe video fingerprinting and search process 112 which is animplementation of a content based video identification process of thepresent invention. The user site 102 may also have access to suchprograms through electronic media, such as may be downloaded over theInternet from an external server, accessed through a universal serialbus (USB) port from flash memory, accessed from disk media of varioustypes, or the like. The fingerprinting and search system 100 may alsosuitably include more servers and user sites than shown in FIG. 1. Also,multiple user sites each operating an instantiated copy or version ofthe video fingerprinting and search process 112 may be connecteddirectly to the server 106 while other user sites may be indirectlyconnected to it over the network 104.

User sites 102 and 103 may generate user video content which is uploadedover the Internet 104 to a server 106 for storage in the video database108. The user sites 102 and 103, for example, may also operate a videofingerprinting and video identification process 112 to generatefingerprints and search for video content in the video database 108. Thevideo fingerprinting and video identification process 112 in FIG. 1 isscalable and utilizes highly accurate video fingerprinting andidentification technology as described in more detail below. The process112 is operable to check unknown video content against a database ofpreviously fingerprinted video content, which is considered an accurateor “golden” database. The video fingerprinting and video identificationprocess 112 is different in a number of aspects from commonly deployedprocesses. For example, the process 112 extracts features from the videoitself rather than modifying the video. The video fingerprinting andvideo identification process 112 allows the server 106 to configure a“golden” database specific to its business requirements. For example,general multimedia content may be filtered according to a set ofguidelines for acceptable multimedia content that may be stored on thebusiness system. The user site 102 that is configured to connect withthe network 104, uses the video fingerprinting and search process 112 tocompare local video streams against a previously generated database ofsignatures in the video database 108.

The video database 108 may store video archives, as well as data relatedto video content stored in the video database 108. The video database108 also may store a plurality of video fingerprints that have beenadapted for use as described herein and in accordance with the presentinvention. It is noted that depending on the size of an installation,the functions of the video fingerprinting and search process 112 and themanagement of the video database 108 may be combined in a singleprocessor system, such as user site 102 or server 106, and may operateas directed by separate program threads for each function.

The fingerprinting and search system 100 for both media fingerprintingand identification is readily scalable to very large multimediadatabases, has high accuracy in finding a correct clip, has a lowprobability of misidentifying a wrong clip, and is robust to many typesof distortion. The fingerprinting and search system 100 uses one or morefingerprints for a unit of multimedia content that are composed of anumber of compact signatures, including cluster keys and associatedmetadata. The compact signatures and cluster keys are constructed to beeasily searchable when scaling to a large database of multimediafingerprints. The multimedia content is also represented by manysignatures that relate to various aspects of the multimedia content thatare relatively independent from each other. Such an approach allows thesystem to be robust to distortion of the multimedia content even whenonly small portions of the multimedia content are available.

Multimedia, specifically audio and video content, may undergo severaldifferent types of distortions. For instance, audio distortions mayinclude re-encoding to different sample rates, rerecording to adifferent audio quality, introduction of noise and filtering of specificaudio frequencies or the like. Video distortions may include cropping,stretching, re-encoding to a lower quality, using image overlays, or thelike. While these distortions change the digital representation, themultimedia is perceptually similar to undistorted content to a humanlistener or viewer. Robustness to these distortions refers to a propertythat content that is perceptually similar will generate fingerprintsthat have a small distance according to some distance metric, such asHamming distance for bit based signatures. Also content that isperceptually distinct from one another will generate fingerprints thathave a large distance, according to the same distance metric. A searchfor perceptually similar content, hence, is transformed to a problem ofsearching for fingerprints that are a small distance away from thedesired fingerprints.

FIG. 2A illustrates a reference media database generation process 200 inaccordance with the present invention. Reference units of multimediacontent, such as video or movie clips 202 ₁, 202 ₂, . . . 202 _(N) thatare relevant to the application at hand are identified. The clips 202 ₁,202 ₂, . . . 202 _(N) refer to distinct units of multimedia content. Forexample, the clips could be from a movie and represent differenttemporal portions of the movie, or they could be from different movies.Using a video fingerprinting function 203 of the video fingerprintingand search process 112, reference signatures 204 ₁, 204 ₂, . . . 204_(M) are generated for the reference multimedia clips 202 ₁, 202 ₂, . .. 202 _(N), respectively, along with hashing data and associatedmetadata, where M may be different than N. Different pieces ofmultimedia content may be fingerprinted independently, leading to aparallelizable system. The set of reference signatures 204 ₁, 204 ₂, . .. 204 _(M) created by the video fingerprinting function 203 is organizedby database generation function 206 into a reference database 208. Thisset of reference signatures is indexed by the generated hashing data,described in further detail below. The associated metadata, alsodescribed in further detail below, is stored along with each referencesignature. A set of reference signatures may also be indexed in otherways, for instance, by multimedia identifiers. A single multimediaidentifier denotes a distinct piece of multimedia content. For instance,the multimedia clips 202 ₁, 202 ₂, . . . 202 _(N) would each berepresented by their own multimedia identifier.

FIG. 2B illustrates a query fingerprint generation process 220 inaccordance with the present invention. A user requests identification ofan unknown multimedia clip 222, also referred to herein as a querymultimedia clip 222. The query multimedia clip 222 is processed by thevideo fingerprinting function 203 to generate query signatures, hashdata, and associated metadata, known collectively as a query fingerprint224, for the unknown multimedia clip 222.

FIG. 2C illustrates a similarity search process 230 in accordance withthe present invention. For each query signature in the query fingerprint224, a similarity search function 232 is initiated to find similarsignatures in the reference database 208. The hash data associated witheach query signature is used to restrict the similarity search function232 to a relatively small portion of the reference data, allowing thesimilarity search to be extremely fast even for large referencedatabases. Only reference signatures that are “similar” within adistance measure to the query signature are returned. These classifiedsimilar reference signatures are added to a candidate list 234, whichcontains identifying information regarding which reference multimediaclip 202 ₁, 202 ₂, . . . 202 _(N) the similar reference signaturesbelong.

FIG. 2D illustrates a candidate video filtering process 240 inaccordance with the present invention. The video filtering process 240analyzes the candidate list 234 for the most likely matches in thereference database 208. The candidate list 234 is sorted in topmultimedia clips function 242 to find the top most likely matchingmultimedia clips. The resulting data is stored in a list of top clips244. The list of top clips 244 includes a multimedia identifier for thesimilar reference multimedia clip. A most likely matching multimediaclip might be only for a specific portion of the multimedia clip, forinstance, a particular time segment, such as seconds 93 to 107 of avideo sequence, or spatial locations, such as top left quadrant in eachof the clip's video frames. The temporal identification and spatiallocations are also included in the list of top clips.

FIG. 2E illustrates a signature correlation process 250 in accordancewith the present invention. The list of top clips 244 is selected forcorrelation. For each of the clips in the top clips list 244, a set ofreference signatures is accessed from the reference database 208 basedon the multimedia identifier and any temporal and/or spatialinformation. A query could correspond to “all reference signatures forvideo number ABC from time 10.4 seconds to 25.7 seconds in thebottom-right quadrant of the frame”. These reference signatures are notrestricted to have matching cluster keys and the criteria to select asubset of them can be further specified. This set of referencesignatures is compared against the query signatures using a signaturecorrelation procedure 252. For each query signature, a score is derivedbased on the number of matches and the distances to the closestsignatures in the set of selected reference signatures. The distancesmeasured could include, for instance, the average distance. These scoresare combined, for example, the scores are averaged, for the entire setof query signatures to give an overall score for a particular referencedatabase clip. Based on whether this score is over a threshold, thereference database clip is determined to be a true positive or a falsepositive. The signature correlation procedure 252 is repeated for allclips in the top clips list 244, to produce a list of matching referencevideos, since there may be more than one, if similar content is repeatedin the database, for example. The fingerprinting function 203 of FIG. 2Aand FIG. 2B belongs to a multimedia fingerprinting system of thefingerprinting and search system 100, while the database generationfunction 206 of FIG. 2A, the similarity search process 230 of FIG. 2C,the candidate video filtering process 240 of FIG. 2D, and the signaturecorrelation process 250 of FIG. 2E belong to a search system of thefingerprinting and search system 100.

FIG. 2F illustrates an exemplary signature database 2000 organized by acluster key index in accordance with the present invention. Thesignature records for all the multimedia content that is to be put intothe signature database 2000 are collected together and grouped by acluster key. At this stage of processing, the number of signatures thatbelong to particular cluster key is known so the memory space for thesignature records can be allocated and signature records may be storedin the memory. The signature records stored in the memory are notconsidered fixed and unchangeable and dynamic updates to the signaturerecords may added as described in more detail below.

It is advantageous for search operations that the signature records fora particular cluster key be stored contiguously. The set of signaturesbelonging to a cluster key is called a cluster. For example, 100signatures and corresponding cluster keys may be generated having 50signatures with a cluster key A, 30 signatures with a cluster key B and20 signatures with a cluster key C. Thus, the 100 signatures areorganized into three clusters, cluster A with 50 signatures, cluster Bwith 30 signatures and cluster C with 20 signatures that are stored inmemory.

For each cluster key, the number of signatures and a pointer to thelocation where the corresponding signature records begin is stored forprocessing. Since the space of cluster keys may be relatively small, forexample, a 16-bit cluster key implies a maximum 65,536 entries, otherpossible cluster keys, such as 24-bit or 32-bit cluster keys, can beused as indexes to locate signature clusters in an array. Alternativearrangements of signatures, such as organizations in the form of abinary tree or in the form of a B-tree or similar data structures mayalso be used. However, aspects of using the cluster keys as indexes inan array are discussed further below.

As shown in FIG. 2F, a cluster key array 2002 stores one element foreach possible cluster key. The index into the cluster key array 2002 isthe integer interpretation of the cluster key as a binary number. Thus,given a cluster key, direct addressing into the array 2002 retrieves thenumber of matching signatures and where corresponding signature recordsare located, such as a link reference address (LRA) to a list ofsignature records 2004 or 2006. In FIG. 2F, for example, cluster key“0101001010101010” is located at entry 2008, which links to the array ofsignature records 2004, and cluster key “1011010100101010” is located atentry 2010, which links to the array of signature records 2006. Eachentry in cluster key array 2002, such as entries 2008 and 2010, have anadditional field included in the entry that stores the link referenceaddress (LRA) to a signature record array, such as signature record 2004and 2006.

FIG. 2G illustrates an exemplary signature database 2050 organized by amultimedia signature index organization in accordance with the presentinvention. The multimedia signature index organization supports accessesof the signature records corresponding to a multimedia identification(id), from a starting playout time to an ending playout time. Themultimedia signature index organization is useful for the signaturecorrelation process, discussed above with regard to FIG. 2E. Themultimedia signature index organization is based on a hash table 2052organized by multimedia id. For example, a hash table entry 2058 storesa pointer to the data structure 2060 that holds metadata and all of thesignatures for this multimedia id “vo102340910” stored in an array 2054.The signatures are stored in playout timestamp order within every array2054, 2056 in the signature database 2050.

The exemplary signature database 2000 and the exemplary signaturedatabase 2050 may be stored either in a local computer's main memory,such as RAM, or on a hard disk drive. One embodiment is to store one orboth of the video database structures in main memory as access speedsare significantly faster. A performance versus capacity tradeoff may bemade concerning the remaining capacity of main memory versus theremaining capacity of the hard drive once the video database structuresare stored.

An exemplary embodiment of signature formation, also referred to asfingerprinting, and database formation is described in U.S. patentapplication Ser. No. 12/141,163 filed Jun. 18, 2008, FIGS. 11-16 andpage 25, line 3 to page 28, line 18. Another exemplary embodiment offingerprinting and database formation is described in U.S. patentapplication Ser. No. 12/612,729 filed Nov. 5, 2009, FIGS. 12-14 and page28, line 20 to page 31, line 13. Another exemplary embodiment offingerprinting and database fin nation is described in U.S. patentapplication Ser. No. 12/491,896 filed Jun. 25, 2009, FIGS. 8-10 and page20, line 8 to page 24, line 22. Another exemplary embodiment offingerprinting and database formation is described in U.S. patentapplication Ser. No. 12/772,566 filed May 3, 2010, FIGS. 4-9B and page23, line 6 to page 40, line 6. A further exemplary embodiment offingerprinting and database formation is described in U.S. patentapplication Ser. No. 12/955,416 filed Nov. 29, 2010, FIGS. 6-12, andpage 16, line 8 to page 29, line 15. An exemplary embodiment of a systemand database formation process is described in U.S. patent applicationSer. No. 12/141,337 filed Jun. 18, 2008, FIGS. 1A, 1B, 1C, and 4-7, andpage 6, line 15 to page 14, line 18 and page 21, line 11 to page 24,line 21. Another exemplary embodiment of a system and database formationprocess is described in U.S. patent application Ser. No. 12/772,566filed May 3, 2010, FIGS. 1-3, and page 10, line 10 to page 23, line 5. Afurther exemplary embodiment of a system and database formation processis described in U.S. patent application Ser. No. 12/788,796 filed May27, 2010, FIGS. 1, 2A and 2B, and page 6, line 14 to page 13, line 2. Anexemplary embodiment of query search is described in U.S. patentapplication Ser. No. 12/141,163 filed Jun. 18, 2008, FIG. 17, and page28, line 19 to page 29, line 6. Another exemplary embodiment of querysearch is described in U.S. patent application Ser. No. 12/141,337 filedJun. 18, 2008, FIGS. 2A-3B and 8, and page 14, line 19 to page 21, line10 and page 24, line 22 to page 26, line 10. Another exemplaryembodiment of query search is described in U.S. patent application Ser.No. 12/612,729 filed Nov. 5, 2009, FIG. 15, and page 31, line 14 to page32, line 15. Another exemplary embodiment of query search is describedin U.S. patent application Ser. No. 12/772,566 filed May 3, 2010, FIGS.10-13, and page 40, line 7 to page 45, line 14. A further exemplaryembodiment of query search is described in U.S. patent application Ser.No. 12/788,796 filed May 27, 2010, FIGS. 3-10, and page 13, line 3 topage 43, line 2. Modifications of the above illustrative approaches orother approaches may be employed consistent with the teachings of thepresent invention.

The following discussion now focuses on further details of the searchsystem. First a search system composed of a single server is described.Later a multi-server search system is described.

A single search server has two main databases that store videofingerprints of reference videos. Each fingerprint is a string of bitsof specified length and associated metadata such as the frame number,video name, type of the signature and the like. The fingerprint may alsocontain the compact cluster key. A cluster key is a string of bits ofsmaller length than that of the main fingerprint. A cluster key can begenerated at the search server using a hashing algorithm. A singlesearch server may use a predefined hashing algorithm to generate acluster key for a given reference signature.

An example cluster key generation process 300 is presented in FIG. 3A.In the example, a fingerprint 302 is shown having 128 bits and agenerated cluster key 304 is shown having 32 bits. For every four bitsfrom the fingerprint 302, one bit of the cluster key 304 is generatedusing an XOR operation on all of the four bits. Specifically, bit numberone 308 in the cluster key 304 is obtained by performing XOR operation306 on bits x1, x2, x3 and x4 from the fingerprint 302. In similarmanner, the thirty second bit 312 in the cluster key 304 is obtained byperforming XOR operation 310 on bits x125, x126, x127 and x128 from thefingerprint 302.

The generated cluster key 304 can be modified using other informationsuch as metadata associated with this fingerprint 302. FIG. 3B depicts aprocess 320 that uses metadata information at step 324 to modify anintermediate cluster key at step 326. The intermediate cluster key atstep 326 is generated by applying a hashing algorithm such as thecluster key generator process 300 to a fingerprint obtained at step 322.The intermediate cluster key at step 326 obtained in this process canmodified by appending more bits to the metadata obtained at step 324.For example, the intermediate cluster key at step 326 can be appendedwith eight more bits to form a final cluster key at step 328. thatuniquely distinguishes categories of multimedia content. For example,“binary 00000001” may indicate a basketball game and “binary 00000010”may indicate a “baseball” game. The binary numbers 00000001, 00000010are predefined to represent specified categories. The intermediatecluster key at step 326 can also be modified using metadata obtained atstep 324. For example, the eight bits, that identify the TV channelassociated with this reference multimedia, can replace the last eightbits of the cluster key.

FIG. 3C presents a process 340 to decide whether to generate a finalcluster key at the search server. At block 344, the search server readsall the fingerprints for a given unit of multimedia content, such as aclip of a video. At decision box 346, a determination is made whetherthe signature data contains cluster keys. If cluster keys are found thenthe signature data is passed directly to block 350. At block 350, acluster index and a multimedia signature index are built. If clusterkeys are not present, the process 340 proceeds to block 348. At block348, cluster keys are generated as described in the process 320. Thegenerated cluster keys are then passed to the next state at connector352.

The single search server system continues with the creation of thecluster index in reference to FIGS. 4A, 4B, and 4C. When the searchserver starts, at block 344 of FIG. 3C, it reads all the fingerprintdata for all of the units of reference multimedia content that need tobe loaded at this search server. The search server then builds a clusterindex and the multimedia signature index from the fingerprint data asdescribed below.

Briefly, fingerprints having the same cluster key are grouped and storedtogether in an array. The beginning of this array is then indexed usingthe associated cluster key. Metadata of each signature along with theactual signatures are also stored in this array. Signatures from asingle video are stored preferably consecutively in one chunk thoughalternative methods of storing are not precluded.

A cluster index is an associated array that maps a cluster key to thearray of signatures associated with that cluster key. Each entry in thecluster index may also include other metadata information such as thenumber of signatures associated with this hash entry, list of the videosthat have some signatures associated with this hash entry and the like.

FIG. 4A illustrates an exemplary primary cluster index structure 400built using the cluster keys, fingerprints and metadata. The clusterindex structure 400 shows details of two clusters 408, 410 that areaccessed using key 2 403 and key k 405 within the cluster index array402. Accessing link references stored at the key 2 403 in the clusterindex 402 provides access to metadata 404 and also to the array ofsignatures 408 associated with this cluster key 2 403. In this example,the array of associated signatures 408 stores the signatures of variousvideos having the same cluster key 2 403. A consecutive block 411 ofsignatures 1 to N−1 corresponds to a subset of signatures from video 1having cluster key 2 403. These signatures are followed by another block412 of signatures associated with video 2 after which there is singlesignature 414 of video 1 again and two empty spaces 416 to hold two moresignatures of any other video. Similarly, key k 405 has a link referenceto metadata 406 and also to an array of associated signatures 410. Video1 and video 2 are shown to have fingerprints in this array 410 as well.

It is not necessary to have only a single cluster index. FIG. 4B shows amultiple signature index database 420 that can be built if there aredifferent types of signatures. Each type of signature would be placed ina corresponding cluster index such as cluster index-1 422, clusterindex-2 423, and including up to cluster index-N 426. For example, iftwo different algorithms are used to generate fingerprints, two clusterindexes, one for each type of fingerprint, could be created. Use ofmultiple cluster indexes can improve identification, reduce falsepositives and improve the speed at which video identification isperformed. Multiple cluster indexes may also be built from a single typeof signature with multiple different cluster keys. In such cases eachcluster index would be built from a particular type of cluster key.

FIG. 4C shows a process 440 for creating a cluster index structure. Atblock 444, the signatures from the units of multimedia content to beloaded at this search server are read and the cluster keys are generatedif not already available. At block 446, a cluster index array iscreated. For each cluster key, the number of signatures falling in eachcluster are calculated, and an array with enough space to store allthese signatures in each cluster is obtained. At block 448, eachsignature is placed in an appropriate array associated with the clusterkey of this signature.

The single server search system continues with the creation of themultimedia signature index in reference to FIG. 5A, and FIG. 5B. Foreach unit of multimedia content that is to be included in the referencedatabase, an array or arrays of fingerprints associated with that videois created. A multimedia signature maps a multimedia content id to thefingerprint array associated with that multimedia identification (id).As an example, FIG. 5A depicts an exemplary multimedia signature indexorganization 500 based on multimedia ids. Two fingerprint arrays 504,506 are associated with multimedia id 2 503. Each fingerprint array 504and 506 is allocated for one particular type of fingerprint.

FIG. 5B illustrates a process 520 to build a multimedia signature indexorganization as described with regard to FIG. 5A. At block 524, thesignatures are read. At block 526, the number of signatures of each typeis calculated for the multimedia content to be stored in the database.Also at block 526, arrays of appropriate size to store these signaturesare allocated. At block 528, the signatures are placed in acorresponding array. In these signatures arrays, the signatures arepreferably arranged in increasing timestamp order.

In the single server search system, signature sharing is illustratedbetween cluster indexes and multimedia signature indexes in FIGS. 5C and5D. Returning to FIGS. 4A and 5A, signatures along with their associatedmetadata are stored separately in the cluster index 400 and multimediasignature index organization 500. Signatures can also be shared acrossthese two data structures by sharing link references. Such sharing wouldreduce memory capacity requirement to store the same amount of referencedatabase and thus increase the capacity of the reference database fornew contents. A number of advantageous ways of sharing signatures acrossthese two data structures are described in more details below.

In a first approach, shown in FIG. 5C, a multimedia signature indexstores a link reference to signatures in a cluster index data structure.FIG. 5C illustrates a data sharing organization 540 that is configuredwith an exemplary cluster index 546 and multimedia signature index 542that supports sharing fingerprints in accordance with the presentinvention. In FIG. 5C, the cluster index 546 is shown with details fortwo cluster keys, key-3 541 and key-4 543. Key-3 541 is shown with alink reference to fingerprint array 548 and key-4 is shown with a linkreference to the fingerprint array 550. The multimedia signature index542 illustrates an entry 547 reference for multimedia-1. The signaturearray 544 referenced by the multimedia-1 entry 547 stores linkreferences for link paths 552, 554, 556, 558, 560 to the associatedsignatures within the fingerprint arrays 548 and 550 associated with thecluster index 546.

FIG. 5D illustrates another exemplary cluster index and multimedia indexdata structure 570 that supports sharing multimedia fingerprints inaccordance with the present inventions. In this approach, multimediasignature index 572 maintains a list of “start and end pointers” in asignature array 574 to signatures in fingerprint arrays 578 and 580associated with cluster index 576. The cluster index 576 illustrates thedetails for key-3 and key-4 with the associated fingerprint arrays 578and 580. The multimedia signature index 572 is shown with the details ofthe signature array 574 that stores entries for various multimediasignature arrays, such as entry 571 for a unit of multimedia content,multimedia 1. The signature array 574 is a linked list of entries, suchas entries 582, 584, 586, that hold start and end pointers to differentsignature blocks in the fingerprint arrays 578 and 580. For example,linked list entry 582 holds start pointer 588 and end pointer 590,linked list entry 584 holds the pointers 592 and 594 and linked listentry 586 holds the start pointer 596 and end pointer 598.

FIG. 6A illustrates a process 625 for updating a cluster index structurewhen fingerprints for a new unit of multimedia content are received inaccordance with the present invention. While the search server isactively running multimedia identification operations, the referencedatabase might have to be updated by adding fingerprints for new unitsof multimedia content. Updating a reference database has two maincomponents. The first component is to update the cluster index and thesecond component is to update the multimedia signature index.

In FIG. 6A, at block 630, the signatures are read. Cluster keys for allsignatures are also read or calculated in the same step. At block 632,the signatures are grouped according to the cluster key and a loop isprepared in which the following steps 634-642 are repeated for eachcluster and for each signature in the cluster. Also, the number of newsignatures falling in a cluster is determined in the loop. Using thatinformation at decision block 632, it is determined if the currentcluster size, which is the size of the signature array associated withthe cluster key, is large enough to store the additional new signatures.If enough space is available, the process 625 proceeds to block 642 andnew additional signatures are copied into the signature array, forexample, at the end following the last of the original signatures. Theassociated metadata, such as the number of fingerprint entries in thisarray associated with that cluster index entry is also modified toreflect the changes made. At decision block 644, it is determinedwhether each cluster and each signature in the cluster has beenprocessed. If more processing is to be done, the loop returns to thedecision block 632.

At decision block 632, if enough space is not available to store the newsignatures the process 625 proceeds to the block 634. At block 634, anew array, sufficient to hold the original number of signatures in thatparticular cluster as well as the new additional signatures to be addedinto this cluster is obtained from the system. Also at block 634, theoriginal signatures from the old array are copied into the newlyobtained array. At block 636, an exclusive write lock on the newlyestablished cluster index is obtained. At block 638, a pointer from thecluster index array for this cluster entry is modified to point to thenew array. The old array is de-allocated and returned to operatingsystem. At block 640, the exclusive write lock on the original array isreleased. At block 642, the new signatures are added to the newlyestablished array since it has enough space to store additionalsignatures. The process 625 returns to the decision block 632. Atdecision block 644, it is determined whether each cluster and for eachsignature in the cluster has been processed. If more processing is to bedone, the loop returns to the decision block 632.

FIG. 6B illustrates an exemplary state diagram 600 showing a clusterindex structure before and after an update that added fingerprints for anew unit of multimedia content in accordance with the present invention.Before the update, cluster array 604 associated with key 2 601 has spaceto store four signatures. The first two entries 603 are from multimedia1 and the last two spaces 605 are empty. When an update for video 2 isreceived, a calculation determines that there are 10 signatures thatneed to be added to cluster signature array 604 with key 2 601. As thearray size associated with cluster array 604 with key 2 601 is onlyfour, it cannot accommodate all of the new signatures. Hence an array ofsize greater than 12, based on old signature count plus 10 new signaturecount, is allocated. In the example, the new array size is 13. After theupdate, the cluster array 612 has 13 spaces to store the signatures. Thefirst two spaces are of multimedia 1, the next 10 signatures 614 are thenewly added multimedia signatures and the last one space is empty.Rather than obtain an exclusive write lock for each cluster separately,an exclusive write lock can be obtained for a plurality of clusters thatneed a reallocation because of insufficient space.

FIG. 6C illustrates a process 650 where such an exclusive lock isobtained for all clusters that need more space for the new signatures inaccordance with the present invention. The process 650 is similar toprocess 625. The process at block 654 makes similar calculations as inprocess and decision block 632 in FIG. 6A. The only difference beinginstead of performing these calculations and actually updating eachcluster separately, calculations for all the clusters are done togetherin the process at block 654. Then, instead of acquiring an exclusivelock for each cluster as shown in the process 636 in FIG. 6A, at block656 and block 658, an exclusive lock is obtained for all clusters withinsufficient size. An actual update process described by blocks 660,662, 664, and 666 of FIG. 6C operates in a similar manner as describedin blocks 636, 638, 640 and 642 of FIG. 6A. The process blocks 656-666are repeated if further processing of clusters is required.

A suitable method to update the multimedia signatures index is similarto the process described in regard to FIGS. 5A and 5B.

While the search server is actively running multimedia identificationoperations, the reference database can be updated by removing amultimedia content from it. Updating the reference database by deletingfingerprints has two main components. The first component is to update acluster index by removing signatures associated with the associatedvideo. The second component is to update the multimedia signature indexby removing the signatures associated with the associated video. FIG. 7Aillustrates a process 720 for updating cluster index structure to removefingerprints associated a multimedia content to be deleted in accordancewith the present invention. FIG. 7B illustrates an exemplary statediagram 700 showing a primary index structure before and after deletingfingerprints for a unit of multimedia content to be deleted inaccordance with the present invention.

In process 720 of FIG. 7A, a new request to delete a unit of multimediacontent from the reference database is received. At block 722, a list ofthe cluster index entries that have at least one signature associatedwith this unit of multimedia content to be deleted is prepared. Forexample, if the original signature data file is available, then the datafile can be read and a cluster index entry list can be created using thedata that was read. Also, a multimedia signature index structure can beread to obtain the signature and cluster key data and from suchinformation a list of cluster index entries can be prepared. In additionto the above approaches, metadata associated with each cluster could beread to check for presence of the multimedia content to be deleted. Fromthis metadata information, a list of cluster index entries can beprepared. Further, all of the cluster index entries and associatedsignatures may be examined to prepare such a list.

At block 724, each cluster in this list that needs modification isdetermined. At block 726, an exclusive read-write lock is obtained foreach cluster. Note that the exclusive write lock can also be acquiredfor plurality of clusters that need the modification at once. At block728, a calculation is made for start and end positions of all signaturesblocks associated with the unit of multimedia content. Thus, in thestate diagram 700, for the cluster associated with key 3 701, the startand end positions of the signature block to be deleted is determined tobe entry 2 703 up to and including entry 11 705. At block 730, thesignatures to be deleted are removed and all of the signatures thatfollow this block are shifted according to the space of the signaturesthat are removed so as to coalesce the signatures before and after theblock to be deleted. Thus, after the removal is completed, all activesignatures are stored consecutively. In the state diagram 700, 10signatures are removed for the cluster associated with key 3 701resulting in a reduced signature array 712. The signatures that aredeleted in this way can be moved to the end of signature array andmarked as deleted. This approach will enable the deleted content to berapidly readded into reference database if ever needed. At block 730,metadata associated with the cluster is also updated to reflect the newnumber of signatures and multimedia contest present in this cluster.

As a result of operations at block 730, at the block 732, the signaturearray-size that holds the signature is optionally and dynamicallyadjusted to free up some of the memory. In the state diagram 700, thecluster array 712 with key 3 701 now has only three active signaturesafter deleting 10 signatures associated with the unit of multimediacontent that was deleted. The total size of array 704 before deletionwas 14 out of which only 3 signatures remain after deletion in the array712. Hence the array size is dynamically readjusted by reallocating thearray or by returning the extra space for 10 signatures. After thisoperation, the size of array 712 becomes 4. At block 734, the exclusivewrite is released 734. If further clusters need modification, theprocess 720 returns to block 726.

FIG. 7C illustrates a process 750 to delete fingerprints from themultimedia signature index (MSI). The following actions are taken todelete a video from the multimedia signature index. At block 754, themultimedia signature index is traversed to find the multimedia contententry to be deleted. At block 758, an exclusive write lock is acquiredto update the multimedia signatures index. At block 760, the signaturearray associated with this multimedia content is deleted and the memoryis returned to the operating system. Instead of removing the multimediaentry, that entry in the multimedia signature index can simply be markedas deleted or inactive. At block 762, the exclusive write lock isreleased. FIG. 7D illustrates an exemplary state diagram 770 that showsthe multimedia signature index structure before and after content hasbeen deleted from it according the present invention. The multimediasignature index 772 holds a reference pointer to the data structure 774holding a signature array along with metadata information for themultimedia id 2 771 before a deletion operation deletes multimedia id 2.When a request to delete multimedia 2 is processed, the multimedia 2entry 775 in the multimedia signature index is marked as deleted, andthe pointer to the data structure that holds the metadata and signaturearray is freed.

Some applications need the ability to add a unit of multimedia contentthat is being currently produced, broadcasted, streamed or displayed.For example, if a video is long then these applications may require anability to update a reference database with parts of the videos beforethe end of the video. Updates that are carried out in this fashion arecalled live event updates. As an example, a 60 minute video may be addedinto the reference database in three separate update chunks or unitseach of 20 minutes. Note that multiple devices may request updating thesame video with different or overlapping parts of a video. The methodspresented here do not depend on a device or a number of devicesrequesting such an update. FIGS. 8A, 8B, and 9, illustrate aspects oflive event updating of a reference database by adding new units ofmultimedia content in real time.

In a first method for live event updates, each update part of themultimedia content is treated as a separate unit of multimedia contentwith a different video id. This case is similar to normal multimediacontent update described with regards to FIGS. 6A and 6B.

In a second method for live event updates, each new update can beprocessed by deleting previously added live event updates for thismultimedia content and then re-adding a combined update of all of theprevious live event update parts and the new live event update part. Inthis case, deletion of previous live event update parts is performed asdescribed with regard to FIGS. 7A-7D and then the combined update isperformed as described with regard to FIGS. 6A and 6B.

In a third method, each update is treated as a part of the same unit ofmultimedia content. The cluster index and the multimedia signature indexare updated accordingly as described with regard to FIG. 8A and FIG. 8B.

FIG. 8A illustrates an exemplary state diagram 800 showing a clusterindex structure before and after an intermediate live event update forportions of multimedia content in accordance with the presentinventions. In FIG. 8A, each update chunk follows a similar procedure ofadding a new unit of multimedia content in the reference database asdescribed in with regard to FIGS. 6A and 6B. However, a strict adherenceto the previous process 625 for such additions would make thefingerprint block for live event updated multimedia contentnon-continuous. After a first update, the signature array 804 forcluster with key 3 801 of the cluster index 802, has a two signaturechunk 803 for live event multimedia 1. After a second live event updatechunk is processed, the updated signature array 808 has two chunks ofsignatures, those associated with the chunk 803 and those associatedwith signature chunk 805 for the live event multimedia 1 in memory. Thetwo signature chunks 803 and 805 are not contiguous. The process toupdate the cluster index for a live event update is same as thedescribed in FIG. 6A.

FIG. 8B illustrates an exemplary state diagram 820 showing a multimediaindex structure before and after an intermediate live event update for aportion of multimedia content in accordance with the present invention.A multimedia signature index stores all of the video signatures in onearray. Hence, to store additional signatures for a new live event updatechunk, a check is made for enough memory to store additionalfingerprints. If enough memory is present, then the new fingerprints forthe live event update chunk are added in the fingerprint arrayassociated with the multimedia content. If sufficient memory is notpresent, more memory is first obtained followed by copying old and newfingerprints into a new array. Before the current update is processed,the signature array 824 associated with the live event multimedia 1 821of the multimedia signature index 822, has two signatures 823 from thefirst update. The size of the signature array 824 is four. After theupdate is processed the signature array 828 has three more signatures833 from the second update along with the two signatures 835 from thefirst update, making the total number of signatures equal to five.Because there was not enough space to store all of the five signaturesin the original signature array 824, a new signature array 828allocation is made to increase the array capacity to hold five or moresignatures. In this case, the allocation size was seven so two signaturespaces remain empty. Note that the chunk 2 signature-3 838 has adifferent cluster key than that of FP-1 836 and FP-2 837 from chunk 2,hence in FIG. 8A in an exemplary cluster index structure, the signaturesassociated with cluster key 3 801 do not contain chunk 2, signature 3.

FIG. 8C illustrates an exemplary process 850 to update the multimediasignature index for a live event update in accordance with the presentinvention. When a new live event update request is received, it is firstread to receive all the new signatures at step 852. Then an exclusivewrite lock is obtained on the multimedia index entry to be updated inthe multimedia signature entry at step 854. A determination is made atstep 856 whether the received live event update chunk is the first forthe requested live update multimedia update. If this is indeed a firstchunk, then at step 858 a new storage allocation large enough to storethe new signatures is made. The signatures from the chunk are stored intimestamp order in this signature array at step 860 and at step 862 theexclusive lock is released. At decision box 856, a determination is madethat the live event update request is not the first, then anotherdetermination at decision box 864 is made about the size of the alreadyexisting signature array. If the size of the existing signature array issufficient to store the new signatures in addition to the existing ones,the process goes to step 860. If the size is not sufficient, then a newsignature array storage allocation sufficient to hold old and newsignatures is made at step 866. The process then proceeds to step 860.

Note that the process of updating the cluster index and multimediasignature index due to a new live event update request does not dependon the source of the update request. Thus, a first update chunk requestof a live event may come from a one source, a second update chunkrequest of a live event may come from another source and a third updatechunk request may come from even different source. Thus, the searchserver is oblivious to the source of these live event update requests.

FIG. 9 illustrates an exemplary state diagram 900 showing a clusterindex structure before and after deleting a portion of multimediacontent that was previously added using a live event update process inaccordance with the present invention. The procedure of deleting a videothat has been added to a reference video as a part of the live eventupdate is similar that of deleting any other multimedia content asdescribed with regard to FIG. 7A. One difference is related to thenon-consecutive chunks of fingerprints in the cluster index structurewhich may result from a live event update. In FIG. 9, the state diagram900 shows an exemplary cluster index 902 with a first signature array904 before a deletion update and a second signatures array 914 after thedeletion update. Before the deletion update, the first signature array904 contains two chunks of the signatures 906 and 908. When thesefingerprints are removed, both the chunks 916 and 918 are first markedfor deletion in the signature array 914. Then, other signatures in thesignature array 914 are moved so as to occupy the consecutive positionssuch that the altered signatures array 924 is produced. Note that thefirst signature array 904 is resized after the deletion update resultingin the signatures organization shown in the signature array 924.

A further aspect of live event updating concerns deleting video from amultimedia signature index. If different live event update chunks ofmultimedia content are added using different content ids then deleting acontent from the multimedia signature index is the same as deletingcorresponding multiple contents, each deletion using the proceduredescribed with regard to FIG. 7C and state diagram 7D. In the live eventcase, multiple units of content from the multimedia signature index aredeleted. However, if a video is live event updated as described withregard to FIG. 8B occurs, then a single content from the multimediasignatures index is removed as described with regard to the statediagram of FIG. 7D as per the process described in FIG. 7C.

FIG. 10A illustrates an exemplary system 1000 having multiple searchservers 1002 ₁, 1002 ₂, . . . , 1002 _(N) and a distributed referencedatabase in accordance with the present invention. The distributedreference database comprises reference databases 1001 ₁, 1001 ₂, . . . ,1001 _(N) each on a corresponding search server and each storing aportion of the multimedia reference database according to a splittingpolicy as described in more detail below. A search query dispatcher 1003is utilized to assign queries to the appropriate portion of thereference database according to the splitting policy. The plurality ofsearch servers may be grouped to form one or more search clusters and acomplete search system may consist of multiple search clusters.

When a reference video database becomes too large to fit on one searchserver, it can be divided across multiple search servers 1002 ₁, 1002 ₂,. . . , 1002 _(N). A video identification request, formatted as a searchquery, is dispatched from the search query dispatcher 1003 to all of thesearch servers 1002 ₁, 1002 ₂, . . . , 1002 _(N) to search across thedistributed reference database. For example, a similarity search, asdescribed with regard to FIG. 2C, is done at each queried search serverfor access to the split portion of the reference multimedia database.Each candidate list resulting from the similarity search is furtherprocessed, as described with regard to FIG. 2D, to produce a list of topclips. As described with regard to FIG. 2E, for each of the clips in thetop clips list 244, a set of reference signatures is accessed from thesplit portion of the reference multimedia database based on themultimedia identifier and any temporal and/or spatial information. Thisset of reference signatures is compared against the query signaturesusing a signature correlation procedure. For each query signature, ascore is derived and these scores are combined for the entire set ofquery signatures to give an overall score for a particular referencedatabase clip. The signature correlation procedure is repeated for allclips in the top clips list, to produce a list of matching referencevideos. A separate result combiner 1008 process then collects resultsfrom all individual search servers 1002 ₁, 1002 ₂, . . . , 1002 _(N) andgenerates final results after merging and evaluating the combinedresults.

The distributed reference database can be arranged in many ways. Forexample, the search system 1000 is configurable to assign the selectedportion of the reference database according to a random splitting of thereference database. In this arrangement, selected multimedia content ofa plurality of multimedia contents can be randomly assigned to one ofthe search servers in the search cluster. Then, the signaturesassociated with the selected portion of the reference database areloaded in a local cluster index on the assigned search server as well asin the multimedia signature index of that particular search server.

In another method of arranging the distributed reference database, theplurality of multimedia contents are split into specified categories.Common examples of categories for movies are—romance, thriller,animation, family, comedy, and other identifiable categories. Multimediacontent is categorized through examination of metadata that mayaccompany the content. A movie category may be interpreted by amultimedia content analysis, for example. Also, an unknown category canbe used when a specific category cannot be determined. Each searchserver holds a reference database for multimedia contents from a singleor multiple categories. This arrangement of the reference database istermed a categorical splitting of the reference database.

An opposite method of the categorical splitting of the referencedatabase is a method for reverse categorical splitting. In the reversecategorical splitting method, multimedia contents within the samecategory are purposefully distributed across as many different searchservers as possible. The categorical splitting minimizes the informationentropy of the distributed reference database while the reversecategorical splitting maximizes it.

The choice of the method to arrange a distributed reference database maybe based on a single server's search algorithm. Some search algorithmsmay perform better within a database composed of similar content whileother search algorithms may perform better within a database composed ofnon similar contents. Further, for some search algorithms composition ofthe reference database might not be a factor.

All normal updates or live event updates to a reference database areassigned to a specific server as per the distributed reference databasesplitting policy. FIG. 10B illustrates a process 1025 to decide how toadd multimedia content to a distributed reference database in accordancewith the present invention. At block 1030, the process 1025 waits untila new update request is received which may be a normal update or a liveevent update request. At block 1032, a determination is made whether theupdate request is a live event update request. If the update request isnot a live event update request the process 1025 proceeds to block 1036.At block 1036, a server is determined by the selected distributedreference database splitting policy and the update is assigned to thatserver. Returning to 1032, if the new update request is a live eventupdate request type, the process 1025 proceeds to block 1034. At block1034 a determination is made whether this live event update request is afirst chunk for new multimedia content. If this update request is notfor a first live event update chunk for new multimedia content, theprocess 1025 processed to block 1038. At block 1038, the same server towhich old live event update requests for this multimedia content havebeen assigned is chosen and the new live event update request isassigned to that server. Returning to the block 1034, if a determinationis made that this is the first live event update chunk for newmultimedia content, then the process 1025 proceeds to block 1036. Atblock 1036, a server is determined by selected distributed referencedatabase splitting policy and the update is assigned to that server.

FIG. 11 illustrates an exemplary multi-cluster multimedia identificationsystem 1100 in accordance with the present inventions. A set of searchservers that together hold a reference multimedia content database iscalled as a search cluster. The speed of the search system can beincreased by replicating the search server clusters. Any request tomodify the reference database is forwarded across all the replicatedsearch clusters as shown in FIG. 11.

The multi-cluster multimedia identification system 1100 shows threesearch clusters 1102, 1104, and 1106 with a multimedia search querydistributor 1120 and a final result unit 1124. Each cluster has its ownresult combiner 1110, 1112, 1114, respectively. These result combinerscombine results from each individual search machine inside each clusterand then provide each cluster's final result that may be combined as asystem final result in the final result unit 1124 for consumption by thequery client. Every new multimedia search request 1120 is distributed bythe multimedia search query distributor to one of the clusters, such ascluster 1106 while a reference database modification request 1122 isforwarded to all of the clusters 1102, 1104, and 1106. With three searchclusters, three searches may operate in parallel. Also, withmulti-threaded search servers, the search servers are operable to workseparately on a plurality of search query requests if so assigned.

Replicating the search clusters increases the number of search queries amulti-cluster multimedia identification system can handle within a givenamount of time by a factor of N where N is the number of replicatedsearch clusters.

In many applications of a video identification system, most of the videoidentification queries would match a very small set of reference videos.For example, most people are interested in watching current TV programs,recent movie releases, current sport matches. The complete referencedatabase may consist of tens of thousands of videos, however the mostsought video content might be just a fraction of it. In such scenarios,a hierarchical search system can be configured as illustrated in FIG.12. FIG. 12 illustrates an exemplary three tier hierarchical system 1200for multimedia identification in accordance with the present invention.The three tier hierarchical system 1200 comprises a query dispatcher1202, a tier-1 system 1204 coupled to a first reference database 1203, afirst tier function unit 1208, a tier-2 system 1210 coupled to a secondreference database 1211, a second tier function unit 1212, a tier-3system 1216 coupled to a third reference database 1217, and a third tierfunction unit 1218.

The most sought reference content is marked in the reference databasecontents and is loaded into a first tier, the tier-1 system 1204 of amulti-tier system. The most sought content can be determined in manyways. For example, if the reference database consists of recent movies,then a movie ranking report based on the weekly popularity of currentlyshowing movies could be used to mark the most popular movie content inthe reference database. The system can start without any externalinformation about the popularity of the reference multimedia content andthen build a popularity index for its reference database contents byanalyzing matched results for recent search queries.

Multimedia identification search requests are dispatched by the querydispatcher 1202 and enter this tier-1 system 1204 in which themultimedia identification query is matched against only tier-1 content.If a match is not found, or if certain criteria is not met, for example,confidence in the detected match is less than a predetermined thresholdor the length of the match found is less than another predeterminedthreshold, this multimedia identification query is sent on to the nexttier, the tier-2 system 1210, of the three tier hierarchical system1200. This search system organization is generalized to multiple tiersin a hierarchical organization where the most popular content isorganized in the tier-1 system 1204, some of the next most popularcontent is organized into the tier-2 system 1210, some of the lesspopular content is organized into the tier-3 system 1216 and so on.

In FIG. 12, the tier-1 system 1204 uses a first reference database 1203made up of the current most popular content. The tier-2 system 1210consists of other recent current content in second reference database1211, while the tier-3 system 1216 consists of the remaining content ina third reference database 1217. A multimedia identification querydispatched by the query dispatcher 1202 is first handled by the tier-1system 1204. If the match is found as determined at first tier functionunit 1208, the results 1206 may be sent back to the requesting client.If the match is not found, then the tier-2 system 1210 handles thedispatched query. If a match is found in the second tier as determinedat second tier function unit 1212, results 1214 may be returned to theclient. Otherwise, the last tier, the tier-3 system 1216 handles thequery and determines if there is a match or no match as determined atthird tier function unit 1218 and the results 1220 are returned to theclient.

The tier structure can also be considered as a way of performingreference database caching. Similar to an L1 cache in computer system,the tier 1 contains the references that are most often accessed ormatched to search queries.

The tier organization may lead to a smaller system cost due to a numberof advantages. For example, overall computing load of the system may bereduced thus enabling fast content identification. The tiered structureof the multi-cluster search system delivers a lower number of queries tothe second tier-2 system 1210 and an even lower number of queries of tothe last tier-3 system 1216. The multi-media identification speed on asingle search server is proportional to the reference database size thatit holds. As the tiered structure has reduced the load on the second andthird tiers compared to the first tier, the reference database size inthe second tier-2 system 1210 and third tier-3 system 1216 can beincreased and still maintain an adequate search speed required to answerevery incoming query to the overall multi-tier identification system.Hence the tiered structure can also help in increasing the distributedreference database size that can support a particular search speed.

The most sought content can be determined in multiple ways. In a firstexemplary approach to determine the most sought content, the number ofsearch queries that match against a prespecified reference multimediacontent over a predefined time interval are counted as a match number bykeeping track of all search queries answered by the search system. Thenall the reference multimedia contents having a match number greater thana prespecified threshold are termed as the most sought videos. Inanother exemplary approach, the probability of a random search querymatching to set of reference multimedia content is prespecified as aprobability threshold. Then, a set of reference content that achieves orexceeds this probability threshold of matching a random query isdetermined. This set of reference multimedia content is then defined asthe most sought content. The set of most sought reference content can bedetermined using a cumulative probability distribution functiondetermined by statistics maintained by the search system of the searchresults, for example.

If the reference content is a set of movies, then movie popularityratings can be used to determine the top N popular movies or top x % ofpopular movies. These top movies then can be termed as the most soughtcontent.

If the reference database is the TV shows, then similarly TV showratings could be used to determine the top N popular shows or top x % ofpopular shows. Using this information the most sought TV shows can bedetermined.

Further, different tiers can have different number of clusters, eachwith a different number of search controllers, to handle correspondinglydifferent query search loads and distributed reference database sizes.Thus, tier organization and clusters of search servers are twoindependent concepts and combinations of these two system configurationscan result in an even more flexible and advantageous search system. Forexample, if a tier-1 system is able to respond to most of the searchqueries, then the following tiers would have much smaller load. In sucha case, to utilize the tiers efficiently, two tier-1 clusters may sharea single tier-2 system as illustrated in FIG. 13.

FIG. 13 illustrates an exemplary two-tier system 1300 for a multi-tiermulti-cluster system that achieves reference database scaling as well asperformance scaling in accordance with the present invention. The tier 1search system 1304 is comprised of two search clusters 1305 and 1306,each further comprised of two search servers. Both the search clusters1305 and 1306 host the same reference multimedia content. Each searchcluster 1305 and 1306 has its own result combiner 1307 and 1308,respectively.

When a new search query 1301 is received in the search system 1300, thequery distributor 1302 dispatches the search query to the tier 1 searchsystem 1304. The query distributor 1302 selects one of the searchclusters, such as search cluster 1306, out of the two search clusters1305 and 1306 in tier 1 and dispatches the search request 1301. Theselection of the search cluster may be done in a pure random fashion, orin a round robin way, or according to any other scheduling policy suchas least loaded cluster selection, or physically closest clusterselection. The result combiner 1308 combines the results of varioussearch controllers in the search cluster 1306 and forwards the resultsto another query distributer 1310. Query distributor 1310 and querydistributor 1302 may be implemented as a part of the same process or maybe separate processes.

Query distributor 1310 checks the results for the match. If a match isfound, then the results are forwarded to the result monitor 1318 whichin turn forwards the result to the search client as final results 1320.If the query distributor 1310 does not find a match, it dispatches thesearch query to tier 2 1312. Tier 2 in this example consists of only onesearch cluster hence the query distributor 1310 does not need to choosea search cluster for this query in tier-2 1312. Also, in this example, asearch query is dispatched to all the search servers in the tier 2. Theresult combiner 1315 collects the match information from all the searchservers and forwards the result to the result monitor 1318 which in turnforwards the results to the query client as final results 1320. Theresult monitor 1318 stores the match results from tier 1 and tier 2, andsends the combined processed results as final results 1320 to the queryclient.

Note that the reference databases in each tier need not be static.Rather, they can be dynamic. Depending on a current profile of queries,or current time, or some other parameter or parameters, the set ofreference content in the different tiers can be changed or moved around.For example, some of reference content from tier 1 that is not sofrequently accessed and matched may get transferred to tier-2. To enablethis functionality for system 1300, the result monitor 1318 alsomaintains statistics about the matches and which reference multimediacontents are matched most often across large sets of queries. Eventhough, it has not been shown explicitly in FIG. 13, it is noted that asdescribed before any updates or delete operations on the referencemultimedia content in tier 1 must be carried for both clusters 1305 and1306.

In some applications, an objective might be to first find a matchingreference video to a queried multimedia content and then track thequeried multimedia content as it progresses against a matched set ofmultimedia contents with minimum overhead and reduced latency. Forexample, consider an application for the purpose of blockingadvertisements on a television set. A reference database is instantiatedin the television set and contains one or more television programswithout advertisements. Whenever an advertisement is displayed on thetelevision set, a multimedia identification query issued in response toa user's selection would return a “no match”. By having the televisionset configured to track the program being watched, whenever it detectsdeviation from the reference program being tracked it can conclude anadvertisement is present. The television set then can take variousactions as per the user preference. For example, the television set cansimply block the advertisement till the program resumes, which thetelevision set would know by tracking the displayed content on thetelevision. Some other possible actions that the television set can takeinclude switching the channel, going into picture-in-picture mode withanother channel, and reducing the audio volume.

FIG. 14A illustrates an exemplary process 1450 for multimediaidentification and multimedia tracking at a local client in accordancewith the present inventions. This process 1450 is executed by the clientwith a local multimedia track mode function. Initially at step 1452, afirst search query for searching the currently displayed content on theclient is made at the local client and sent to a remote search server.The remote server then returns the match results to the client. Atdecision step 1454, a determination is made at the client whether theresult contains a matching reference video or not. If a matchingreference video is found, then the process continues to the step 1456.Instead, if a matching reference video is not found at decision step1454, the process 1450 returns to step 1452 and performs a new searchquery for searching the currently displayed content on the client at theremote server. The new search query is done after some predeterminedtime delay to reduce load on the remote search system.

With reference to the step 1456, as the client has found a matchingreference video, it then downloads signatures for the matching referencevideo or signatures for the part of the matching reference video aroundthe match. The decision to download full or part of the matchingreference multimedia signatures can be taken by the client or by theremote server. Various factors can affect this decision, including butnot limited to the size of the matched reference content and theprobability of client continuing to display the same content withoutchanging the displayed program.

After downloading, the process 1450 builds a local search database atstep 1458 with the downloaded matching reference multimedia signaturesin the reference database. At step 1460, the process 1450 then entersinto a track mode. At step 1462, the process 1450 makes new track searchrequests to the local search database to track the video. For example,the television set searches for the content being displayed for the pastfive seconds. The track search does not use the similarity search phaseof the normal search and instead uses a correlation phase of the normalsearch. Hence, track search consumes less computational resources andcan be performed faster.

At step 1464, a determination is made whether the results returned matchthe query. If the track search at the local search system return a matchto the same reference, then the process goes back to step 1462 where itcan make a new track search query. Instead, if at decision step 1464, amatch is not found, then at step 1466, a new full search query is madeto the local reference database. This full local search is moreexpensive than the track search query in terms of the computationalresources needed. However, the full local search is faster than a remotesearch query because the full search is performed against a much smallerdatabase. When the local full search results are returned, at decisionstep 1468, a determination is made whether a match to the same referencevideo is found. If indeed a match has been found, then the trackingoperation is continued by returning to step 1462. Otherwise, it isconcluded that the local content has diverged from the reference contentand the local track method is abandoned at step 1470 and the process1450 returns at the beginning step 1452.

FIG. 14B illustrates an exemplary state diagram 1400 of various searchqueries done at a client that has an ability to perform local multimediatrack functions. In this exemplary state diagram 1400, multimedia searchqueries 1402, 1404, . . . , 1430 are made by the client as the timeprogresses to the right and shown with details of these search queries.Specifically, for each search query, a brief description of the searchmethod, whether it is a remote search or local track search or localfull search, along with a description of results and other actions arespecified. At the beginning, the search query 1402 is made at the remotesearch system. The remote search system returns a no match result 1403and hence the next search query 1404 is made at the remote search systemagain. The search query 1404 is answered with a match 1405, the clientdownloads the signatures associated with the reference and builds alocal search database and enters into a track mode. The next searchquery 1406 is made to the local search system as a local track searchquery which, for example, results in successful match 1407. Hence, theprocess is repeated for the search query 1408 with the match results1409.

The next search query 1410 is first made as a local tack search queryand, for example, results in a no match finding. Hence, since the localtrack search filed then the same query is also made as a local fullsearch query 1410. The local full search query 1410 is made with thesame signatures and, for example matching results are found 1411. Thesame process with similar results 1413 is repeated for a next query1412. For the next two queries 1414 and 1416, local track searchoperations return positive results 1415, 1417, so the track modecontinues till that point in the time. However, for the next searchquery 1418, neither the local track search query nor the local fullsearch query returns a positive match 1419. Hence, it is concluded thatthe content being played locally has diverged from the referencecontent. Further, the track mode is abandoned and a remote search query1418 is performed. In this case, the remote system also does not find amatch 1419. The next query 1420 is made directly to the remote searchsystem, which returns a no match indication 1421 and the process isrepeated for query 1422 with no match result 1423. However, for thequery 1424, the remote search system finds the match 1425 and hence theclient downloads the signatures, updates the local search database forthis reference and enters into the track mode. The track modesuccessfully continues for the next three queries 1426, 1428, 1430 withsuccessful results 1427, 1429 and 1431.

Now consider a scenario where a video that is aired on a televisionchannel, a TV or any other device that can receive the TV channel suchas a set top box may fingerprint the content aired on the televisionchannel and may query the search system to find a matching reference.However, if two such querying devices are tuned to the same channel,there is a considerable chance that both clients may generate the samesignatures and thus would generate exactly the same query. In such case,if one query is searched by the search system after another, and ifsomehow the search system can notice that exact same query has beensearched before, it can reduce the computational overhead of the secondsearch by remembering the results of the first search query andreturning those results for the second query. This operation is termedas query caching in an embodiment of the present invention.

In one aspect, the search system can cache the completed query results.The results can be cached using a caching key storage unit, such as alook-up table which may be organized as a hash map data structure, forexample. The keys for this look-up table are generated from querysignatures and search results are stored as values of the look-up table.The search system can employ a separate caching server or servers or thecaching functionality can be integrated at every search server. FIG. 15Aillustrates an exemplary search process 1550 operating on a searchserver with the query caching functionality enabled on it. When a newquery is received, at step 1552, the search server calculates a hash keyfor the query using a predetermined hashing function that operates onthe query signatures. The hashing function used in step 1552 can be acheck sum function such as an MD-5 key generation function. The searchserver then looks for this hash key in the look-up table at step 1554that it has maintained to store the query search results. For example,the look-up table may be organized to use a hash key as an address inthe look-up table to access search results. In an alternativeconfiguration, a content addressable memory (CAM) may store hash keysand associated search results. If the hash key is found in the look-uptable, the server returns the results from the look-up table at step1562. If the hash key is not found in the look-up table at decision step1554, then the server performs an actual search against the referencemultimedia database using the query signatures at step 1556. At step1558, the results of this search are stored in the look-up table usingthe hash key generated in step 1552 and the results are returned to thequery client at step 1560.

Note that instead of generating a single hash key for the look-up table,multiple hash keys can be generated using different set of querysignatures. For example, a hash key can be generated using signaturesfrom a 5 second query. Then three hash keys could be generated for aquery that is 15 seconds long by using non-overlapping signatures blockseach representing a duration of 5 seconds. While doing a search even ifa single hash key out of the three hash keys is found to be in look-uptable, a result associated with that key is declared as a matchingresult. The look-up table is searched and updated for all hash keysassociated with every query. Even more hash keys can be generated usingoverlapping signature blocks and generating a hash key for each block ofsignatures. This method is termed a generation of a sequence of hashkeys.

Also, note that the caching need not be performed only at the servers.The method of query caching involves generating a unique hash key orsequences of hash keys from the signatures. This can be performed at thequerying client or any other device. Only this unique key or sequencesof keys need to be queried to find cached query results. No actualsearch needs to be performed if a cached result is found. For thisreason, this caching based querying for matching content can beperformed in various forms such as querying to a peer to peer (P2P)network that is made of individual query clients, querying to a contentdelivery network (CDN) or querying to a separate caching server thatonly maintains such cached results.

FIG. 15B illustrates a distributed search system process 1500 thatincorporates centralized and distributed cache servers in accordancewith the present invention. In FIG. 15B, process 1500 begins at step1502, by generating signatures at a client for a unit of multimediacontent to be identified. At step 1504, hash keys are generated by theclient using the generated signatures. Also, at step 1504, the generatedhash keys are sent to a centralized query cache server or a distributedcache server on a network, such as the P2P network.

If the search has been forwarded to the centralized cache server at step1506, the centralized cache server checks for the results using the hashkeys. If the results are found, the results are returned back to theclient 1502. However, if the results are not found at the centralizedcache server, then the cache server forwards the search request to thecentralized search system 1508 that performs the actual search usingquery signatures. The main search system 1508 then informs the queryclient 1502 of the actual search result. The main search system 1508also informs the cache server 1506 about the results which in turnupdates its look-up table by associating the search results to the hashkeys linked with this query signature.

If the search at step 1504 is forwarded to a distributed system to findthe cached results, the distributed system returns with a result.

If the client 1502 decides to be a part of distributed P2P cachingnetwork, it then stores the obtained results into a lookup table withthe keys being the generated hash keys at step 1504. At a later time, adifferent client may produce a hash key that is the same as one of thehash keys produced by this client 1502. If that new client queries usingthat hash key to a distributed network in which the client 1502 hasjoined, potentially the client 1502 can reply with the matching result.Thus the P2P caching network can share some of the search load of themain search system.

FIG. 15C illustrates a process 1518 executed at query clients for cachebased multimedia content search in accordance with the presentinventions. At block 1520, the client generates signatures. At block1522, a sequence of cache keys are generated, that are much smaller inlength compared to the signatures generated at block 1520. At block1524, the client queries the centralized search system having areference database using the sequence of cache keys. At block 1525, thequery client receives the results of the cache key search from thecentralized search system. At block 1526, the query client determines ifthe results contain a match in response to the cache keys. If a match isfound, the query client ends the process 1518. If the match is not foundin the results using the cache keys at block 1526, the process 1518proceeds to a block 1530. At block 1530, the query client performs amultimedia identification search query to the centralized search systemusing the signatures generated at block 1520. At block 1532, the queryclient receives the results of the signature search from the centralizedsearch system and the query client ends the process 1518.

Those of skill in the art will appreciate from the present disclosureadditional, alternative systems and methods for actionable televisionevent generation, based on television program audio and video contentfingerprinting, in accordance with the disclosed principles of thepresent invention. Thus, while particular embodiments and applicationsof the present invention have been illustrated and described, it is tobe understood that the invention is not limited to the preciseconstruction and components disclosed herein and that variousmodifications, changes and variations which will be apparent to those ofordinary skill in the art may be made in the arrangement, operation anddetails of the method and apparatus of the present invention disclosedherein without departing from the spirit and scope of the invention.

We claim:
 1. A method of tracking multimedia content, the methodcomprising: searching in a remote database in response to a query sentfrom a user device for multimedia content of a multimedia program thatis currently playing on the user device; finding matching signatures forthe multimedia content in a reference program stored in the remotedatabase; retrieving the matching signatures and reference signaturesfor succeeding segments of the reference program beginning from thepoint in the reference program the multimedia content was found in theremote database; storing the matching signatures and referencesignatures in a local database on the user device; tracking themultimedia program in the local database to find matching content forthe succeeding segments of the reference program; determining themultimedia content has changed to different multimedia content than thatstored in the succeeding segments of the reference program; andreplacing the determined different multimedia content to a preselectedmultimedia content.
 2. The method of claim 1 further comprising: playingthe preselected multimedia content for a prespecified duration; andreturning to play the multimedia program upon detecting matching contentfor subsequent segments of the reference program.
 3. The method of claim1, wherein the preselected multimedia content is a new source ofmultimedia content.
 4. The method of claim 3, wherein the new source ofmultimedia content is a picture-in-picture mode of operation withanother channel.
 5. The method of claim 1, wherein the preselectedmultimedia content is a reduction in the audio on the user device. 6.The method of claim 1, wherein the succeeding segments of the referenceprogram is a full reference program including the matching signaturesfrom the point in the reference program the multimedia content was foundin the remote database.
 7. The method of claim 1 further comprising:blocking the playing of the preselected multimedia content; andreturning to display the multimedia program upon detecting matchingcontent for subsequent segments of the reference program.
 8. The methodof claim 1, wherein the searching in the remote database uses asimilarity search method combined with a correlation search method. 9.The method of claim 1, wherein the tracking to find matching contentuses a correlation method.
 10. An apparatus for tracking multimediacontent, the apparatus comprising: a user processing device configuredfor sending a query to a remote database to search for multimediacontent of a multimedia program identified by a user, wherein matchingsignatures are found for the multimedia content in a reference programstored in the remote database, and the matching signatures and referencesignatures for succeeding segments of the reference program beginningfrom the point in the reference program the multimedia content was foundare retrieved from the remote database; and a local database coupled tothe user processing device and configured for storing the matchingsignatures and reference signatures retrieved from the remote database,wherein the user processing device is further configured to track themultimedia program in the local database to find matching content forthe succeeding segments of the reference program, the user processingdevice is further configured to determine the multimedia content haschanged to different multimedia content than that stored in thesucceeding segments of the reference program, and wherein the userprocessing device is further configured to replace the determineddifferent multimedia content to a preselected multimedia content. 11.The apparatus of claim 10, wherein the local database is stored in amemory internal to the user processing device.
 12. The apparatus ofclaim 10, wherein the local database is stored in a random access memoryand a hard disk drive internal to the user processing device.
 13. Theapparatus of claim 10, wherein each reference signature includes acluster key comprising a link reference address to a list of signaturerecords stored in the local database.
 14. The apparatus of claim 13,wherein the cluster key is generated by applying a fingerprintingprocess to bits of a reference fingerprint having a first number ofbits, wherein the cluster key comprises a second number of bits that isless than the first number of bits.
 15. A computer readablenon-transitory medium storing a computer program which causes a computersystem to perform a method of tracking multimedia content, the methodcomprising: searching in a remote database in response to a query sentfrom a user device for multimedia content of a multimedia program thatis currently playing on the user device; finding matching signatures forthe multimedia content in a reference program stored in the remotedatabase; retrieving the matching signatures and reference signaturesfor succeeding segments of the reference program beginning from thepoint in the reference program the multimedia content was found in theremote database; storing the matching signatures and referencesignatures in a local database on the user device; tracking themultimedia program in the local database to find matching content forthe succeeding segments of the reference program; determining themultimedia content has changed to different multimedia content than thatstored in the succeeding segments of the reference program; andreplacing the determined different multimedia content to a preselectedmultimedia content.
 16. The computer readable non-transitory mediummethod of claim 15 further comprising: playing the preselectedmultimedia content for a prespecified duration; and returning to playthe multimedia program upon detecting matching content for subsequentsegments of the reference program.
 17. The computer readablenon-transitory medium method of claim 15, wherein the preselectedmultimedia content is a picture-in-picture mode of operation withanother channel.
 18. The computer readable non-transitory medium methodof claim 15 further comprising: blocking the playing of the preselectedmultimedia content; and returning to display the multimedia program upondetecting matching content for subsequent segments of the referenceprogram.
 19. The computer readable non-transitory medium method of claim15, wherein the searching in the remote database uses a similaritysearch method combined with a correlation search method.
 20. Thecomputer readable non-transitory medium method of claim 15, wherein eachreference signature includes a cluster key comprising a link referenceaddress to a list of signature records stored in the local database. 21.A method comprising: generating, by a client device, a first querysignature representing a portion of a first content unit, wherein theportion of the first content unit is presented by the client device;sending, by the client device, to a server device, the generated firstquery signature; responsive to sending the generated first querysignature, receiving, by the client device, from the server device, aset of reference signatures representing additional portions of thefirst content unit, wherein the set of reference signatures was selectedby the server device based at least in part on the generated first querysignature; generating, by the client device, a second query signaturerepresenting a portion of a second content unit that is different fromthe first content unit; comparing, by the client device, the generatedsecond query signature with the received set of reference signatures;determining, by the client device, that the comparing resulted in nomatch being found; and responsive to determining that the comparingresulted in no match being found, performing, by the client device, anaction.
 22. The method of claim 21, wherein the first content unitcomprises video content.
 23. The method of claim 22, wherein the firstcontent unit further comprises audio content.
 24. The method of claim21, wherein the client device is a television set and wherein presentedby the client device comprises displayed by the client device.
 25. Themethod of claim 21, wherein the portion of the second content unit isreceived but not presented by the client device.
 26. The method of claim21, wherein the first content unit is a television show and the secondcontent unit is an advertisement.
 27. The method of claim 21, whereincomparing the generated second query signature with the received set ofreference signatures comprises using a track search technique to comparethe generated second query signature with the received set of referencesignatures.
 28. The method of claim 21, wherein comparing the generatedsecond query signature with the received set of reference signaturescomprises using a full search query technique to compare the generatedsecond query signature with the received set of reference signatures.29. The method of claim 21, wherein comparing the generated second querysignature with the received set of reference signatures comprises usingboth a track search technique and a full search query technique tocompare the generated second query signature with the received set ofreference signatures.
 30. The method of claim 21, wherein performing theaction comprises: using user-preference data associated with a user ofthe client device to identify a user-preferred action; and performingthe identified user-preferred action.
 31. The method of claim 21,wherein performing the action comprises at least temporarily blockingpresentation of the second content unit.
 32. The method of claim 21,wherein performing the action comprises blocking presentation of thesecond content unit for a duration of the second content unit.
 33. Themethod of claim 21, wherein performing the action comprises switching toa different channel on which to receive and present content.
 34. Themethod of claim 21, wherein performing the action comprises switching topresenting different content.
 35. The method of claim 21, wherein theclient device is a television set and wherein performing the actioncomprises initiating a picture-in-picture mode of the television set.36. The method of claim 21, wherein performing the action comprisesreducing an audio volume level of the client device.
 37. Anon-transitory computer-readable storage medium having stored thereonprogram instructions that, when executed by one or more processors of aclient device, cause the client device to perform operations comprising:generating a first query signature representing a portion of a firstcontent unit, wherein the portion of the first content unit is presentedby the client device; sending, to a server device, the generated firstquery signature; responsive to sending the generated first querysignature, receiving, from the server device, a set of referencesignatures representing additional portions of the first content unit,wherein the set of reference signatures was selected by the serverdevice based at least in part on the generated first query signature;generating a second query signature representing a portion of a secondcontent unit that is different from the first content unit; comparingthe generated second query signature with the received set of referencesignatures; determining that the comparing resulted in no match beingfound; and responsive to determining that the comparing resulted in nomatch being found, performing an action.
 38. The non-transitorycomputer-readable storage medium of claim 37, wherein the first contentunit comprises video content.
 39. The non-transitory computer-readablestorage medium of claim 38, wherein the first content unit furthercomprises audio content.
 40. The non-transitory computer-readablestorage medium of claim 37, wherein the client device is a televisionset and wherein presented by the client device comprises displayed bythe client device.
 41. The non-transitory computer-readable storagemedium of claim 37, wherein the portion of the second content unit isreceived but not presented by the client device.
 42. The non-transitorycomputer-readable storage medium of claim 37, wherein the first contentunit is a television show and the second content unit is anadvertisement.
 43. The non-transitory computer-readable storage mediumof claim 37, wherein comparing the generated second query signature withthe received set of reference signatures comprises using a track searchtechnique to compare the generated second query signature with thereceived set of reference signatures.
 44. The non-transitorycomputer-readable storage medium of claim 37, wherein comparing thegenerated second query signature with the received set of referencesignatures comprises using a full search query technique to compare thegenerated second query signature with the received set of referencesignatures.
 45. The non-transitory computer-readable storage medium ofclaim 37, wherein comparing the generated second query signature withthe received set of reference signatures comprises using both a tracksearch technique and a full search query technique to compare thegenerated second query signature with the received set of referencesignatures.
 46. The non-transitory computer-readable storage medium ofclaim 37, wherein performing the action comprises: using user-preferencedata associated with a user of the client device to identify auser-preferred action; and performing the identified user-preferredaction.
 47. The non-transitory computer-readable storage medium of claim37, wherein performing the action comprises at least temporarilyblocking presentation of the second content unit.
 48. The non-transitorycomputer-readable storage medium of claim 37, wherein performing theaction comprises blocking presentation of the second content unit for aduration of the second content unit.
 49. The non-transitorycomputer-readable storage medium of claim 37, wherein performing theaction comprises switching to a different channel on which to receiveand present content.
 50. The non-transitory computer-readable storagemedium of claim 37, wherein performing the action comprises switching topresenting different content.
 51. The non-transitory computer-readablestorage medium of claim 37, wherein the client device is a televisionset and wherein performing the action comprises initiating apicture-in-picture mode of the television set.
 52. The non-transitorycomputer-readable storage medium of claim 37, wherein performing theaction comprises reducing an audio volume level of the client device.53. A client device comprising one or more processors and anon-transitory computer-readable storage medium having stored thereonprogram instructions that, when executed by the one or more processors,cause the client device to perform operations comprising: generating afirst query signature representing a portion of a first content unit,wherein the portion of the first content unit is presented by the clientdevice; sending, to a server device, the generated first querysignature; responsive to sending the generated first query signature,receiving, from the server device, a set of reference signaturesrepresenting additional portions of the first content unit, wherein theset of reference signatures was selected by the server device based atleast in part on the generated first query signature; generating asecond query signature representing a portion of a second content unitthat is different from the first content unit; comparing the generatedsecond query signature with the received set of reference signatures;determining that the comparing resulted in no match being found; andresponsive to determining that the comparing resulted in no match beingfound, performing an action.
 54. The client device of claim 53, whereinthe first content unit comprises video content.
 55. The client device ofclaim 54, wherein the first content unit further comprises audiocontent.
 56. The client device of claim 53, wherein the client device isa television set and wherein presented by the client device comprisesdisplayed by the client device.
 57. The client device of claim 53,wherein the portion of the second content unit is received but notpresented by the client device.
 58. The client device of claim 53,wherein the first content unit is a television show and the secondcontent unit is an advertisement.
 59. The client device of claim 53,wherein comparing the generated second query signature with the receivedset of reference signatures comprises using a track search technique tocompare the generated second query signature with the received set ofreference signatures.
 60. The client device of claim 53, whereincomparing the generated second query signature with the received set ofreference signatures comprises using a full search query technique tocompare the generated second query signature with the received set ofreference signatures.
 61. The client device of claim 53, whereincomparing the generated second query signature with the received set ofreference signatures comprises using both a track search technique and afull search query technique to compare the generated second querysignature with the received set of reference signatures.
 62. The clientdevice of claim 53, wherein performing the action comprises: usinguser-preference data associated with a user of the client device toidentify a user-preferred action; and performing the identifieduser-preferred action.
 63. The client device of claim 53, whereinperforming the action comprises at least temporarily blockingpresentation of the second content unit.
 64. The client device of claim53, wherein performing the action comprises blocking presentation of thesecond content unit for a duration of the second content unit.
 65. Theclient device of claim 53, wherein performing the action comprisesswitching to a different channel on which to receive and presentcontent.
 66. The client device of claim 53, wherein performing theaction comprises switching to presenting different content.
 67. Theclient device of claim 53, wherein the client device is a television setand wherein performing the action comprises initiating apicture-in-picture mode of the television set.
 68. The client device ofclaim 53, wherein performing the action comprises reducing an audiovolume level of the client device.
 69. A method comprising: receiving,by a server device, from a client device, a first query signature thatwas generated by the client device and that represents a portion of afirst content unit, wherein the client device presented the portion ofthe first content unit; based at least in part on the received firstquery signature, selecting a set of reference signatures representingadditional portions of the first content unit; and sending, by theserver device, to the client device, the selected set of referencesignatures to facilitate the client device: (i) comparing a second querysignature generated by the client device and that represents a portionof a second content unit that is different from the first content unitwith the set of reference signatures; (ii) determining that thecomparing resulted in no match being found, and (iii) responsive todetermining that the comparing resulted in no match being found,performing an action.
 70. The method of claim 69, wherein the firstcontent unit comprises video content.
 71. The method of claim 70,wherein the first content unit further comprises audio content.
 72. Themethod of claim 69, wherein the client device is a television set andwherein presented the portion of the first content unit comprisesdisplayed the portion of the first content unit.
 73. The method of claim69, wherein the portion of the second content unit is received but notpresented by the client device.
 74. The method of claim 69, wherein thefirst content unit is a television show and the second content unit isan advertisement.
 75. The method of claim 69, wherein comparing thesecond query signature with the set of reference signatures comprisesusing a track search technique to compare the second query signaturewith the set of reference signatures.
 76. The method of claim 69,wherein comparing the second query signature with the set of referencesignatures comprises using a full search query technique to compare thesecond query signature with the set of reference signatures.
 77. Themethod of claim 69, wherein comparing the second query signature withthe set of reference signatures comprises using both a track searchtechnique and a full search query technique to compare the second querysignature with the set of reference signatures.
 78. The method of claim69, wherein performing the action comprises: using user-preference dataassociated with a user of the client device to identify a user-preferredaction; and performing the identified user-preferred action.
 79. Themethod of claim 69, wherein performing the action comprises at leasttemporarily blocking presentation of the second content unit.
 80. Themethod of claim 69, wherein performing the action comprises blockingpresentation of the second content unit for a duration of the secondcontent unit.
 81. The method of claim 69, wherein performing the actioncomprises switching to a different channel on which to receive andpresent content.
 82. The method of claim 69, wherein performing theaction comprises switching to presenting different content.
 83. Themethod of claim 69, wherein the client device is a television set andwherein performing the action comprises initiating a picture-in-picturemode of the television set.
 84. The method of claim 69, whereinperforming the action comprises reducing an audio volume level of theclient device.
 85. A non-transitory computer-readable storage mediumhaving stored thereon program instructions that, when executed by one ormore processors of a server device, cause the server device to performoperations comprising: receiving, from a client device, a first querysignature that was generated by the client device and that represents aportion of a first content unit, wherein the client device presented theportion of the first content unit; based at least in part on thereceived first query signature, selecting a set of reference signaturesrepresenting additional portions of the first content unit; and sending,to the client device, the selected set of reference signatures tofacilitate the client device: (i) comparing a second query signaturegenerated by the client device and that represents a portion of a secondcontent unit that is different from the first content unit with the setof reference signatures; (ii) determining that the comparing resulted inno match being found, and (iii) responsive to determining that thecomparing resulted in no match being found, performing an action. 86.The non-transitory computer-readable storage medium of claim 85, whereinthe first content unit comprises video content.
 87. The non-transitorycomputer-readable storage medium of claim 86, wherein the first contentunit further comprises audio content.
 88. The non-transitorycomputer-readable storage medium of claim 85, wherein the client deviceis a television set and wherein presented the portion of the firstcontent unit comprises displayed the portion of the first content unit.89. The non-transitory computer-readable storage medium of claim 85,wherein the portion of the second content unit is received but notpresented by the client device.
 90. The non-transitory computer-readablestorage medium of claim 85, wherein the first content unit is atelevision show and the second content unit is an advertisement.
 91. Thenon-transitory computer-readable storage medium of claim 85, whereincomparing the second query signature with the set of referencesignatures comprises using a track search technique to compare thesecond query signature with the set of reference signatures.
 92. Thenon-transitory computer-readable storage medium of claim 85, whereincomparing the second query signature with the set of referencesignatures comprises using a full search query technique to compare thesecond query signature with the set of reference signatures.
 93. Thenon-transitory computer-readable storage medium of claim 85, whereincomparing the second query signature with the set of referencesignatures comprises using both a track search technique and a fullsearch query technique to compare the second query signature with theset of reference signatures.
 94. The non-transitory computer-readablestorage medium of claim 85, wherein performing the action comprises:using user-preference data associated with a user of the client deviceto identify a user-preferred action; and performing the identifieduser-preferred action.
 95. The non-transitory computer-readable storagemedium of claim 85, wherein performing the action comprises at leasttemporarily blocking presentation of the second content unit.
 96. Thenon-transitory computer-readable storage medium of claim 85, whereinperforming the action comprises blocking presentation of the secondcontent unit for a duration of the second content unit.
 97. Thenon-transitory computer-readable storage medium of claim 85, whereinperforming the action comprises switching to a different channel onwhich to receive and present content.
 98. The non-transitorycomputer-readable storage medium of claim 85, wherein performing theaction comprises switching to presenting different content.
 99. Thenon-transitory computer-readable storage medium of claim 85, wherein theclient device is a television set and wherein performing the actioncomprises initiating a picture-in-picture mode of the television set.100. The non-transitory computer-readable storage medium of claim 85,wherein performing the action comprises reducing an audio volume levelof the client device.
 101. A server device comprising one or moreprocessors and a non-transitory computer-readable storage medium havingstored thereon program instructions that, when executed by the one ormore processors, cause the server device to perform operationscomprising: receiving, from a client device, a first query signaturethat was generated by the client device and that represents a portion ofa first content unit, wherein the client device presented the portion ofthe first content unit; based at least in part on the received firstquery signature, selecting a set of reference signatures representingadditional portions of the first content unit; and sending, to theclient device, the selected set of reference signatures to facilitatethe client device: (i) comparing a second query signature generated bythe client device and that represents a portion of a second content unitthat is different from the first content unit with the set of referencesignatures; (ii) determining that the comparing resulted in no matchbeing found, and (iii) responsive to determining that the comparingresulted in no match being found, performing an action.
 102. The serverdevice of claim 101, wherein the first content unit comprises videocontent.
 103. The server device of claim 102, wherein the first contentunit further comprises audio content.
 104. The server device of claim101, wherein the client device is a television set and wherein presentedthe portion of the first content unit comprises displayed the portion ofthe first content unit.
 105. The server device of claim 101, wherein theportion of the second content unit is received but not presented by theclient device.
 106. The server device of claim 101, wherein the firstcontent unit is a television show and the second content unit is anadvertisement.
 107. The server device of claim 101, wherein comparingthe second query signature with the set of reference signaturescomprises using a track search technique to compare the second querysignature with the set of reference signatures.
 108. The server deviceof claim 101, wherein comparing the second query signature with the setof reference signatures comprises using a full search query technique tocompare the second query signature with the set of reference signatures.109. The server device of claim 101, wherein comparing the second querysignature with the set of reference signatures comprises using both atrack search technique and a full search query technique to compare thesecond query signature with the set of reference signatures.
 110. Theserver device of claim 101, wherein performing the action comprises:using user-preference data associated with a user of the client deviceto identify a user-preferred action; and performing the identifieduser-preferred action.
 111. The server device of claim 101, whereinperforming the action comprises at least temporarily blockingpresentation of the second content unit.
 112. The server device of claim101, wherein performing the action comprises blocking presentation ofthe second content unit for a duration of the second content unit. 113.The server device of claim 101, wherein performing the action comprisesswitching to a different channel on which to receive and presentcontent.
 114. The server device of claim 101, wherein performing theaction comprises switching to presenting different content.
 115. Theserver device of claim 101, wherein the client device is a televisionset and wherein performing the action comprises initiating apicture-in-picture mode of the television set.
 116. The server device ofclaim 101, wherein performing the action comprises reducing an audiovolume level of the client device.